logic puzzle
Bridging Natural Language and ASP: A Hybrid Approach Using LLMs and AMR Parsing
Hite, Connar, Saud, Sean, Taha, Raef, Rahman, Nayim, Atahary, Tanvir, Douglass, Scott, Taha, Tarek
Answer Set Programming (ASP) is a declarative programming paradigm based on logic programming and non-monotonic reasoning. It is a tremendously powerful tool for describing and solving combinatorial problems. Like any other language, ASP requires users to learn how it works and the syntax involved. It is becoming increasingly required for those unfamiliar with programming languages to interact with code. This paper proposes a novel method of translating unconstrained English into ASP programs for logic puzzles using an LLM and Abstract Meaning Representation (AMR) graphs. Everything from ASP rules, facts, and constraints is generated to fully represent and solve the desired problem. Example logic puzzles are used to demonstrate the capabilities of the system. While most current methods rely entirely on an LLM, our system minimizes the role of the LLM only to complete straightforward tasks. The LLM is used to simplify natural language sentences, identify keywords, and generate simple facts. The AMR graphs are then parsed from simplified language and used to generate ASP constraints systematically. The system successfully creates an entire ASP program that solves a combinatorial logic problem. This approach is a significant first step in creating a lighter-weight, explainable system that converts natural language to solve complex logic problems.
- North America > United States (0.68)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Steven Pinker's new book shows how he's become a contradictory figure
Steven Pinker's new book shows how he's become a contradictory figure Steven Pinker's new book When Everyone Knows That Everyone Knows makes a compelling case for common knowledge. Steven Pinker argues that "cancel culture" is a form of censorship Steven Pinker's new book perfectly encapsulates what a contradictory figure he has become. Much of it is a clear, fascinating explanation of a major psychological phenomenon . But then he starts telling you what he thinks about current affairs. Pinker is a psychologist at Harvard University who has written a string of popular science books. Some, like Words and Rules, are rooted in his own research and are a good read.
- Asia > Middle East > Iraq (0.15)
- Asia > Mongolia (0.05)
- Health & Medicine (0.74)
- Law (0.71)
- Government (0.49)
- Education > Health & Safety > School Nutrition (0.31)
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning
Wong, Zhen Hao, Deng, Jingwen, He, Runming, Chen, Zirong, You, Qijie, Dong, Hejun, Liang, Hao, Shen, Chengyu, Cui, Bin, Zhang, Wentao
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks, especially for mid-difficulty problems that require multi-step reasoning. Analyses across problem categories and difficulty levels reveal that puzzle training promotes transferable reasoning routines, strengthening algebraic manipulation, geometric inference, and combinatorial logic, while offering limited gains on rote or highly specialized tasks. These findings show that reinforcement learning over logic puzzles reshapes the internal reasoning of LLMs, enabling more robust and compositional generalization without relying on task-specific symbolic tools.
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL
Feng, Yichen, Xu, Zhangchen, Jiang, Fengqing, Li, Yuetai, Ramasubramanian, Bhaskar, Niu, Luyao, Lin, Bill Yuchen, Poovendran, Radha
Vision language models (VLMs) are expected to perform effective multimodal reasoning and make logically coherent decisions, which is critical to tasks such as diagram understanding and spatial problem solving. However, current VLM reasoning lacks large-scale and well-structured training datasets. To bridge this gap, we propose VisualSphinx, a first-of-its-kind large-scale synthetic visual logical reasoning training data. To tackle the challenge of image synthesis with grounding answers, we propose a rule-to-image synthesis pipeline, which extracts and expands puzzle rules from seed questions and generates the code of grounding synthesis image synthesis for puzzle sample assembly. Experiments demonstrate that VLM trained using GRPO on VisualSphinx benefit from logical coherence and readability of our dataset and exhibit improved performance on logical reasoning tasks. The enhanced reasoning capabilities developed from VisualSphinx also benefit other reasoning tasks such as algebraic reasoning, arithmetic reasoning and geometry reasoning.
- North America > United States (0.28)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Singapore (0.04)
- (2 more...)
Causal language modeling can elicit search and reasoning capabilities on logic puzzles
Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent to which fundamental search and reasoning capabilities emerged within LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling can learn a complex task such as solving Sudoku puzzles. To solve a Sudoku, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell. Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell.
LR$^2$Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems
Chen, Jianghao, Wei, Zhenlin, Ren, Zhenjiang, Li, Ziyong, Zhang, Jiajun
Recent progress in o1-like models has significantly enhanced the reasoning abilities of Large Language Models (LLMs), empowering them to tackle increasingly complex tasks through reflection capabilities, such as making assumptions, backtracking, and self-refinement. However, effectively evaluating such reflection capabilities remains challenging due to the lack of appropriate benchmarks. To bridge this gap, we introduce LR$^2$Bench, a novel benchmark designed to evaluate the Long-chain Reflective Reasoning capabilities of LLMs. LR$^2$Bench comprises 850 samples across six Constraint Satisfaction Problems (CSPs) where reflective reasoning is crucial for deriving solutions that meet all given constraints. Each type of task focuses on distinct constraint patterns, such as knowledge-based, logical, and spatial constraints, providing a comprehensive evaluation of diverse problem-solving scenarios. We conduct extensive evaluation on both conventional models and o1-like models. Our experimental results reveal that even the most advanced reasoning-specific models, such as DeepSeek-R1 and OpenAI o1-preview, struggle with tasks in LR$^2$Bench, achieving an average Exact Match score of only 20.0% and 23.6%, respectively. These findings underscore the significant room for improvement in the reflective reasoning capabilities of current LLMs. The leaderboard of our benchmark is available at https://huggingface.co/spaces/UltraRonin/LR2Bench
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- (3 more...)
- Leisure & Entertainment > Sports (0.68)
- Leisure & Entertainment > Games (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Beyond Interpolation: Extrapolative Reasoning with Reinforcement Learning and Graph Neural Networks
Grillo, Niccolò, Toccaceli, Andrea, Mathys, Joël, Estermann, Benjamin, Fresca, Stefania, Wattenhofer, Roger
Despite incredible progress, many neural architectures fail to properly generalize beyond their training distribution. As such, learning to reason in a correct and generalizable way is one of the current fundamental challenges in machine learning. In this respect, logic puzzles provide a great testbed, as we can fully understand and control the learning environment. Thus, they allow to evaluate performance on previously unseen, larger and more difficult puzzles that follow the same underlying rules. Since traditional approaches often struggle to represent such scalable logical structures, we propose to model these puzzles using a graph-based approach. Then, we investigate the key factors enabling the proposed models to learn generalizable solutions in a reinforcement learning setting. Our study focuses on the impact of the inductive bias of the architecture, different reward systems and the role of recurrent modeling in enabling sequential reasoning. Through extensive experiments, we demonstrate how these elements contribute to successful extrapolation on increasingly complex puzzles.These insights and frameworks offer a systematic way to design learning-based systems capable of generalizable reasoning beyond interpolation.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Texas (0.04)
- Europe > Norway > Western Norway > Vestland > Bergen (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
AI tools like ChatGPT and Google's Gemini are 'irrational' and prone to making simple mistakes, study finds
While you might expect AI to be the epitome of cold, logical reasoning, researchers now suggest that they might be even more illogical than humans. Researchers from University College London put seven of the top AIs through a series of classic tests designed to test human reasoning. Even the best-performing AIs were found to be irrational and prone to simple mistakes, with most models getting the answer wrong more than half the time. However, the researchers also found that these models weren't irrational in same way as a human while some even refused to answer logic questions on'ethical grounds'. Olivia Macmillan-Scott, a PhD student at UCL and lead author on the paper, says: 'Based on the results of our study and other research on Large Language Models, it's safe to say that these models do not'think' like humans yet.'
Leveraging Large Language Models to Generate Answer Set Programs
Ishay, Adam, Yang, Zhun, Lee, Joohyung
Large language models (LLMs), such as GPT-3 and GPT-4, have demonstrated exceptional performance in various natural language processing tasks and have shown the ability to solve certain reasoning problems. However, their reasoning capabilities are limited and relatively shallow, despite the application of various prompting techniques. In contrast, formal logic is adept at handling complex reasoning, but translating natural language descriptions into formal logic is a challenging task that non-experts struggle with. This paper proposes a neuro-symbolic method that combines the strengths of large language models and answer set programming. Specifically, we employ an LLM to transform natural language descriptions of logic puzzles into answer set programs. We carefully design prompts for an LLM to convert natural language descriptions into answer set programs in a step by step manner. Surprisingly, with just a few in-context learning examples, LLMs can generate reasonably complex answer set programs. The majority of errors made are relatively simple and can be easily corrected by humans, thus enabling LLMs to effectively assist in the creation of answer set programs.
- Asia > Middle East > Oman (0.05)
- South America > Bolivia (0.05)
- South America > Argentina (0.05)
- (4 more...)
Exploiting Asymmetry in Logic Puzzles: Using ZDDs for Symbolic Model Checking Dynamic Epistemic Logic
Miedema, Daniel, Gattinger, Malvin
Binary decision diagrams (BDDs) are widely used to mitigate the state-explosion problem in model checking. A variation of BDDs are Zero-suppressed Decision Diagrams (ZDDs) which omit variables that must be false, instead of omitting variables that do not matter. We use ZDDs to symbolically encode Kripke models used in Dynamic Epistemic Logic, a framework to reason about knowledge and information dynamics in multi-agent systems. We compare the memory usage of different ZDD variants for three well-known examples from the literature: the Muddy Children, the Sum and Product puzzle and the Dining Cryptographers. Our implementation is based on the existing model checker SMCDEL and the CUDD library. Our results show that replacing BDDs with the right variant of ZDDs can significantly reduce memory usage. This suggests that ZDDs are a useful tool for model checking multi-agent systems.
- North America > United States > Colorado (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Africa > Cameroon > Far North Region > Maroua (0.04)